45 research outputs found
Divergence analysis and processing for Mandarin-English parallel text exploitation
Previous work shows that the process of parallel
text exploitation to extract mappings between
language pairs raises the capability of language
translation. However, while this process can be
fully automated, one thorny problem called “divergence” causes indisposed mapping extraction. Therefore, this paper discuss the issues of parallel text exploitation, in general, with special emphasis on divergence analysis and processing. In the experiments on a Mandarin-English travel conversation corpus of 11,885 sentence pairs, the perplexity with the alignments in IBM translation model is reduced averagely from 13.65 to 4.18
A Large-Vocabulary Bilingual Speech Recognition System for Chinese and Japanese Language
Bilingual or Multilingual speech recognition gradually becomes an attractive research topic because bilingual writings appear almost everywhere in present day. In this paper, we propose a continuous word-based speech recognition system to dictate the Mandarin and Japanese speech simultaneously. We find that there are about 62 basic phoneme like units(PLUs) among the mixed Mandarin and Japanese syllables. The 62 HA/Ms are used to decode the input speech into word hypotheses based on a fast tree-beam searching algorithm. In the language model, the bigram model and trigram model are used to select the most likely word from the word candidates. We also have a bilingual dictionary to deal with the cross language information. Our proposed system architecture can not only dictate Mandarin and Japanese speech simultaneously but also provide a possible solution to recognize any other bilingual speech. I